79 research outputs found
NCGNN: Node-level Capsule Graph Neural Network
Message passing has evolved as an effective tool for designing Graph Neural
Networks (GNNs). However, most existing works naively sum or average all the
neighboring features to update node representations, which suffers from the
following limitations: (1) lack of interpretability to identify crucial node
features for GNN's prediction; (2) over-smoothing issue where repeated
averaging aggregates excessive noise, making features of nodes in different
classes over-mixed and thus indistinguishable. In this paper, we propose the
Node-level Capsule Graph Neural Network (NCGNN) to address these issues with an
improved message passing scheme. Specifically, NCGNN represents nodes as groups
of capsules, in which each capsule extracts distinctive features of its
corresponding node. For each node-level capsule, a novel dynamic routing
procedure is developed to adaptively select appropriate capsules for
aggregation from a subgraph identified by the designed graph filter.
Consequently, as only the advantageous capsules are aggregated and harmful
noise is restrained, over-mixing features of interacting nodes in different
classes tends to be avoided to relieve the over-smoothing issue. Furthermore,
since the graph filter and the dynamic routing identify a subgraph and a subset
of node features that are most influential for the prediction of the model,
NCGNN is inherently interpretable and exempt from complex post-hoc
explanations. Extensive experiments on six node classification benchmarks
demonstrate that NCGNN can well address the over-smoothing issue and
outperforms the state of the arts by producing better node embeddings for
classification
Scene Graph Lossless Compression with Adaptive Prediction for Objects and Relations
The scene graph is a new data structure describing objects and their pairwise
relationship within image scenes. As the size of scene graph in vision
applications grows, how to losslessly and efficiently store such data on disks
or transmit over the network becomes an inevitable problem. However, the
compression of scene graph is seldom studied before because of the complicated
data structures and distributions. Existing solutions usually involve
general-purpose compressors or graph structure compression methods, which is
weak at reducing redundancy for scene graph data. This paper introduces a new
lossless compression framework with adaptive predictors for joint compression
of objects and relations in scene graph data. The proposed framework consists
of a unified prior extractor and specialized element predictors to adapt for
different data elements. Furthermore, to exploit the context information within
and between graph elements, Graph Context Convolution is proposed to support
different graph context modeling schemes for different graph elements. Finally,
a learned distribution model is devised to predict numerical data under
complicated conditional constraints. Experiments conducted on labeled or
generated scene graphs proves the effectiveness of the proposed framework in
scene graph lossless compression task
Frequency-Aware Transformer for Learned Image Compression
Learned image compression (LIC) has gained traction as an effective solution
for image storage and transmission in recent years. However, existing LIC
methods are redundant in latent representation due to limitations in capturing
anisotropic frequency components and preserving directional details. To
overcome these challenges, we propose a novel frequency-aware transformer (FAT)
block that for the first time achieves multiscale directional ananlysis for
LIC. The FAT block comprises frequency-decomposition window attention (FDWA)
modules to capture multiscale and directional frequency components of natural
images. Additionally, we introduce frequency-modulation feed-forward network
(FMFFN) to adaptively modulate different frequency components, improving
rate-distortion performance. Furthermore, we present a transformer-based
channel-wise autoregressive (T-CA) model that effectively exploits channel
dependencies. Experiments show that our method achieves state-of-the-art
rate-distortion performance compared to existing LIC methods, and evidently
outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in
BD-rate on the Kodak, Tecnick, and CLIC datasets
- …